4 research outputs found
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation
Editing real facial images is a crucial task in computer vision with
significant demand in various real-world applications. While GAN-based methods
have showed potential in manipulating images especially when combined with
CLIP, these methods are limited in their ability to reconstruct real images due
to challenging GAN inversion capability. Despite the successful image
reconstruction achieved by diffusion-based methods, there are still challenges
in effectively manipulating fine-gained facial attributes with textual
instructions.To address these issues and facilitate convenient manipulation of
real facial images, we propose a novel approach that conduct text-driven image
editing in the semantic latent space of diffusion model. By aligning the
temporal feature of the diffusion model with the semantic condition at
generative process, we introduce a stable manipulation strategy, which perform
precise zero-shot manipulation effectively. Furthermore, we develop an
interactive system named ChatFace, which combines the zero-shot reasoning
ability of large language models to perform efficient manipulations in
diffusion semantic latent space. This system enables users to perform complex
multi-attribute manipulations through dialogue, opening up new possibilities
for interactive image editing. Extensive experiments confirmed that our
approach outperforms previous methods and enables precise editing of real
facial images, making it a promising candidate for real-world applications.
Project page: https://dongxuyue.github.io/chatface
MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation
Unsupervised domain adaption has been widely adopted in tasks with scarce
annotated data. Unfortunately, mapping the target-domain distribution to the
source-domain unconditionally may distort the essential structural information
of the target-domain data, leading to inferior performance. To address this
issue, we firstly propose to introduce active sample selection to assist domain
adaptation regarding the semantic segmentation task. By innovatively adopting
multiple anchors instead of a single centroid, both source and target domains
can be better characterized as multimodal distributions, in which way more
complementary and informative samples are selected from the target domain. With
only a little workload to manually annotate these active samples, the
distortion of the target-domain distribution can be effectively alleviated,
achieving a large performance gain. In addition, a powerful semi-supervised
domain adaptation strategy is proposed to alleviate the long-tail distribution
problem and further improve the segmentation performance. Extensive experiments
are conducted on public datasets, and the results demonstrate that the proposed
approach outperforms state-of-the-art methods by large margins and achieves
similar performance to the fully-supervised upperbound, i.e., 71.4% mIoU on
GTA5 and 71.8% mIoU on SYNTHIA. The effectiveness of each component is also
verified by thorough ablation studies.Comment: Accepted by TPAMI-IEEE Transactions on Pattern Analysis and Machine
Intelligence. arXiv admin note: substantial text overlap with
arXiv:2108.0801